Record Linkage
نویسندگان
چکیده
R linkage, in the present context, is simply the bringing together of information from two records that are believed to relate to the same entity—for example, the same individual, the same family, or the same business. This might involve the linking of records within a single database to identify duplicate case records. Alternatively, record linkage might involve the linking of records across two or more databases. Such work might be undertaken to merge these databases into a single database with improved coverage or scope. The record linkage work is easiest when unique identification numbers (such as Social Security Numbers) are readily available. The work is more challenging when only quasi-identifiers such as given name, surname, date of birth, and address are available. In combination, quasi-identifiers may uniquely identify an individual.
منابع مشابه
Probabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملRecord Linkage I: Evaluation of Commercially Available Record Linkage Software for Use in NASS
Record linkage is an important technique in NASS for minimizing the presence of duplicate names on its list sampling frame of farm operators and agribusinesses. In the late 1970' s, NASS developed an automated record linkage system which runs on an IBM mainframe for this purpose. With changes in technology, the need has arisen for portability between platforms, integration with client/server te...
متن کاملA Decision Tree Based Record Linkage for Recommendation Systems
Record linkage merges all the records relating to the same entity from multiple datasets, at the entity level. It is the initial data preparation phase for most of the database projects. Traditionally one to one data linkage is performed among the entities of same type with common unique identifier. The proposed one to many and/or many to many record linkage method is able to link the entities ...
متن کاملLinkage Flooding: Ein Algorithmus zur dateninhaltsorientierten Fusion in vernetzten Informationsbeständen
Dieses Papier stellt ein spezielles Record Linkage Verfahren (Linkage Flooding) vor, das für die Suche nach Duplikaten in vernetzten Informationsbeständen optimiert ist. Nach einer kurzen Erläuterung von Anwendungsszenarien des Record Linkage sowie der Vorstellung des Record Linkage Prozesses wird der Linkage Flooding Algorithmus beschrieben und über experimentelle Ergebnisse bei der Duplikater...
متن کاملTAILOR: A Record Linkage Tool Box
Data cleaning is a vital process that ensures the quality of data stored in real-world databases. Data cleaning problems are frequently encountered in many research areas, such as knowledge discovery in databases, data warehousing, system integration and e-services. The process of identifying the record pairs that represent the same entity (duplicate records), commonly known as record linkage, ...
متن کاملProbabilistic record linkage
Studies involving the use of probabilistic record linkage are becoming increasingly common. However, the methods underpinning probabilistic record linkage are not widely taught or understood, and therefore these studies can appear to be a 'black box' research tool. In this article, we aim to describe the process of probabilistic record linkage through a simple exemplar. We first introduce the c...
متن کامل